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Whole-genome sequencing (WGS) could potentially provide a single platform for extracting all the information required to pre- 
dict an organism's phenotype. However, its ability to provide accurate predictions has not yet been demonstrated in large inde- 
pendent studies of specific organisms. In this study, we aimed to develop a genotypic prediction method for antimicrobial sus- 
ceptibilities. The whole genomes of 501 unrelated Staphylococcus aureus isolates were sequenced, and the assembled genomes 
were interrogated using BLASTn for a panel of known resistance determinants (chromosomal mutations and genes carried on 
plasmids). Results were compared with phenotypic susceptibility testing for 12 commonly used antimicrobial agents (penicillin, 
methiciUin, erythromycin, clindamycin, tetracycline, ciprofloxacin, vancomycin, trimethoprim, gentamicin, fusidic acid, rifam- 
pin, and mupirocin) performed by the routine clinical laboratory. We investigated discrepancies by repeat susceptibility testing 
and manual inspection of the sequences and used this information to optimize the resistance determinant panel and BLASTn 
algorithm. We then tested performance of the optimized tool in an independent validation set of 491 unrelated isolates, with 
phenotypic results obtained in duplicate by automated broth dilution (BD Phoenix) and disc diffusion. In the validation set, the 
overall sensitivity and specificity of the genomic prediction method were 0.97 (95% confidence interval [95% CI], 0.95 to 0.98) 
and 0.99 (95% CI, 0.99 to 1), respectively, compared to standard susceptibility testing methods. The very major error rate was 
0.5%, and the major error rate was 0.7%. WGS was as sensitive and specific as routine antimicrobial susceptibility testing meth- 
ods. WGS is a promising alternative to culture methods for resistance prediction in S. aureus and ultimately other major bacte- 
rial pathogens. 



Whole-genome sequencing (WGS) is a rapidly advancing 
technology, and increasingly affordable benchtop sequenc- 
ers could be in use in the routine clinical laboratory within the 
next decade (1). It may soon be practical to sequence specimens 
directly in a matter of hours, resulting in enormous diagnostic 
improvements and creating new challenges for the routine labo- 
ratory (2). One key application is likely to be antimicrobial resis- 
tance prediction from the genome sequence ("resistance geno- 
type"), analogous to in silico multiplex PGR for a large number of 
known resistance genes. It is theoretically possible to recover the 
entire complement of genes encoding resistance from the 
genomic sequence of an isolate in a single step. Recently, Zankari 
et al. (3) and Stoesser et al. (4) reported on resistance prediction 
using a simple BLAST method. Despite the limited size and lack of 
independent validation in these studies, they provide intriguing 
hints that much phenotypic resistance may be simply explained 
using genotypic prediction from WGS. 

To be used confidently in clinical practice, reliable genotypic 
prediction of antimicrobial resistance phenotype has to be dem- 
onstrated to the same standards as any new phenotypic method, 
using large diverse sets of unrelated isolates. Comprehensively val- 
idated genotypic prediction of antimicrobial resistance, ready for 
implementation in clinical practice, will require multiple large 
studies. However, early investigation of the performance of 
known genetic determinants for resistance prediction could estab- 
lish the feasibility of this approach. 

In this study, we describe a three-step approach for developing 
a resistance gene prediction method using Staphylococcus aureus. 



with (i) initial development using easily available bioinformatics 
algorithms and a "derivation set" of 501 isolates, (ii) testing and 
algorithm refinement, and (iii) validation of the method in a fur- 
ther unrelated set of 491 isolates. 

MATERIALS AND METHODS 

Creation of a catalogue of antimicrobial resistance genes. A panel of 
antimicrobial agents (Tables 1 and 2) was identified for investigation 
based on those used routinely for management of S. aureus infections in 
the Oxford University Hospitals (OUH) National Health Service (NHS) 
Trust. To identify the genetic determinants encoding resistance to these 
antimicrobial agents, a literature search was conducted in PubMed using 
the medical subject heading (MESH) terms ^'Staphylococcus aureus" and 
"Drug resistance, microbial" and individual antimicrobial drug names to 
create a catalogue of antimicrobial resistance genes and variants, using 
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TABLE 1 Panel of genes associated with mobile genetic elements used for BLAST query" 



Antimicrobial agent (s) 


Gene' 


Product^ 


Reference gene accession no. 
{nucleotide positions) 


PenicUlin 


blaZ 


Class A heta-lactamase 


BX571856.1 (1913827-1914672) 


Methicillin 


mecA 


Low-affinity PBP2 


BX571856.1 (44919-46925) 


Erythromycin 


msrA* 


Erythromycin resistance protein 


CP003194 (54168-55634) 


Erythromycin and clindamycin 


ertnA 


rRNA adenine N-6-methyltransferase 


BA000018.3 (56002-56733) 




ermB 


rRNA adenine N-6-methyltransferase 


AB699882.1 (4971-5708) 




ermC 


rRNA adenine JV-6-methyltransferase 


HB579068 (7858-8592) 




ermT 


23S rRNA methylase 


HP583292 (11344-12078) 


Tetracycline 


tetK 


MPS tetracycline effux pump 


PN433596 (69118-70497) 




tetL 


MPS tetracycline efflux pump 


HP583292 (7713-9089) 




tetM 


Ribosomal protection protein 


CP002643 (427033-428952) 


Vancomycin 


vanA 


Low-affinity peptidoglycan precursor 


AE017171.1 


Fusidic acid 


fusB 


Pusidic acid detoxification 


CP003193.1 (1336-1977) 




far"* 


Ribosome protection protein 


AY373761.1 (19072-19713) 


Trimethoprim 


dfrA 


Insensitive dihydrofolate reductase 


CP002120 (2093303-2093788) 




dfrG 


Insensitive dihydrofolate reductase 


PN433596 (502263-502760) 


Gentamicin 


aacA-aphD 


6'-aminoglycoside N-acetyltransferase/ 


PN433596.1 (2209531-2210970) 






2"-aminoglycoside phosphotransferase 




Mupirocin (high-level resistance) 


mupA 


Isoleucyl-tRNA synthetase 


HP579068 (2157-5231) 




mupB 


Isoleucyl-tRNA synthetase 


IQ231224 


" Presence of the gene correlates with phenotypic resistance. Reference sequences were obtained from published sequences from human 
^ An asterisk indicates that the gene was added in v2.0. 


clinical isolates. 



MPS, major facilitator superfamily. 



published sequences deposited in GenBank (http://www.ncbi.nlm.nih 
.gov/GenBank/). 

Whole-genome sequencing, assembly, and resistance gene detec- 
tion. Ethical approval for sequencing S. aureus isolates from routine clin- 
ical samples and linkage to patient data without individual patient consent 
in Oxford and Brighton in the United Kingdom was obtained from Berk- 
shire Ethics Committee (10/H0505/83) and the United Kingdom Na- 
tional Information Governance Board [8-05(e)/20I0]. For all isolates, 
DNA was extracted and sequenced using the lUumina HiSeq 2000 plat- 
form (San Diego, CA, USA) as previously described (5). To assess se- 
quencing quality, reads were mapped to reference strain MRSA 252 (Gen- 



Bank accession no. BX57 1856. 1 ) using Stampy vl .0. 1 8(6). MRSA 252 was 
chosen as it contains staphylococcal cassette chromosome mec (SCCmec), 
has been capillary sequenced (7), and belongs to a common United King- 
dom methicillin-resistant S. aureus (MRSA) clone (EMRSA 16). To ob- 
tain whole genomes for BLAST, reads were then de novo assembled using 
Velvet vl.0.18 (8). Samples were excluded if they failed quality checks 
either on mapping ( <70% coverage of reference genome after filtering) or 
assembly (<50% of the genome in contigs > 1 kb). 

Initial method development using the derivation set. The de novo- 
assembled genomes were interrogated with BLAST+ (v 2.2.28 + ) blastn 
and tblastn (9) to identify nucleotide sequences matching genes from the 



TABLE 2 Panel of housekeeping genes with amino acid variants known to be associated with antimicrobial resistance" 



Antimicrobial agent 


Gene 


Amino acid substitutions'' 


Reference gene accession no. 
(nucleotide positions) 


Ciprofloxacin 


gyrA 


S84L, E88K, G106D, S85P, P88G, E88L 


BX571857.1 (7005-9668) 




grlA 


S80P, S80Y, E84K, E84G, E84V, D432G, Y83N, A116E, I45M, A48T, D79V, V41G, S108N 


BX571857.1 (1386869-1389271) 




grlB 


R470D*, E422D*, P451S*, P585S*, D443E*, R444S* 


BX571857.1 (1384872-1386869) 


Pusidic acid 


fusA 


A160V*, A376V, A655E, A655P*, A655V*, A67T*, A70V*, A71V*, B434N, C473S*, 


BX571857.1 (577685-579766) 






D189G*, D189V*, D373N*, D463G*, P233Q*, E444K, E444V*, E449K*, P441Y, 








P652S*, G451V, G452C, G452S, G556S, G617D, G664S, H438N, H457Q, H457Y, 








L430S*, L456P, L461K, L461S, M16ir, M453I, M651I, P114H, P404L, P404Q, P406L, 








P478S, Ql 15L, R464C, R464H, R464S, R659C, R659H, R659L, R659S, R76C*, S416P*, 








T385N, T387r, T436I, T656K, V607I, V90A, V90I, Y654N* 




Rifampin 


rpoB 


A473T*, A477D, A477T*, A477V, D471G*, D471Y, D550G, H481D, H481N, H481Y, 


BX571857 (568813-572364) 






I527P, I527L*, 1527 M*, ins 475H, ins G475*, L466S*, M470T», N474K*, Q456K, 








Q468K, Q468L, Q468R, Q565R*, R484H, S463P, S464P, S486L, S529L* 




Trimethoprim 


dfrB 


P99Y, P99S, P99I, H31N, L41P, H150R, L21V*, N60I* 


BX571857.1 (1464014-1464493) 



" All housekeeping gene sequences were obtained from the genome of reference strain MSSA 476 (7). 

^ An asterisk indicates that the amino acid substitution was reported in association with other variants. For the expected effect of each variant/combination on MIC, please see the 
supplemental data, ins, insertion. 
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TABLE 3 Source of isolates and method of susceptibility testing 



Collection 


No. of 
isolates 


Source 


Specimen type 


Dates 




Susceptibility testing methods 


Sequencing 


Derivation set 


88 


Brighton (all wards) 


Blood 


1999- 


-2007 


Automated broth dilution 


Previously sequenced 


collections 












(Vitek), clinical laboratory 




(n = 501) 


90 


Brighton carriage 


Nasal swab 


2010- 


-2011 


Disc diffusion, clinical laboratory 


Previously sequenced 






(ITU) 














323 


Oxford (all wards) 


Blood 


2008- 


-2011 


Disc diffusion, clinical laboratory 


Previously sequenced 


Validation set 


102 


Oxford carriage (ITU) 


Nasal swab 


2009 




Disc diffusion and automated 


Previously sequenced 


collections 












broth dilution (BD Phoenix) 




(y, — 4Q1 "l 


100 


Oxtord carriage 


JNa.sal swab 


2009 




Disc diffusion and automated 


Previously sequenced 






(community) 








broth dilution (BD Phoenix) 






165 


Brighton (all wards) 


Blood 


2011- 


-2012 


Disc diffusion and automated 


Sequenced de novo, susceptibility 














broth dilution (BD Phoenix) 


testing performed from same 
















subculture as sequencing 




124 


Oxford (all wards) 


Blood 


2011- 


-2012 


Disc diffusion and automated 


Sequenced de novo, susceptibility 














broth dilution (BD Phoenix) 


testing performed from same 
















subculture as sequencing 



" Brighton, Brighton and Sussex University Hospitals NHS Trust; Oxford, Oxford Radcliffe Hospitals NHS Trust; ITU, intensive therapy unit. 



panel and their matching protein sequences, respectively. The parameters 
for the two programs were as follows: for blastn, word size of 17, gap 
opening penalty of 5, and gap extension penalty of 2; for tblastn, word size 
of 3, gap opening penalty of 11, and gap extension penalty of 1. The 
E- value cutoff was set at 0.001. Relative coverage was defined as the prod- 
uct of the proportion of reference allele matched and the sequence identity 
of the match. For the initial algorithm (vl.O), a relative coverage threshold 
of >80% was chosen to define gene presence with a high degree of simi- 
larity to the reference, based on pilot data (for example, 95% relative 
coverage may be 95% of the gene length with 100% identity or 100% of the 
gene length with 95% identity). For housekeeping genes, where resistance 
is conferred by one or more point mutations, differences between the 
tblastn result and the query protein sequence were compared to the se- 
quence of the wild-type protein and compared against the catalogue of 
known antimicrobial resistance-encoding mutations compiled above. 
Changes in protein sequence (at the same or different codons) which were 
not previously reported as conferring resistance were counted as suscep- 
tible. 

To determine the diversity of the isolates tested, in silico prediction of 
multilocus sequence type (MLST) was also performed using BLAST+. 
The S. aureus MLST alleles were extracted from assemblies based on se- 
quence similarity to allele 1 for each locus, and the online MLST database 
(http://saureus.mlst.net/) was used to predict the ST. 

The initial development of the algorithm was not done in a blind 
manner, using 501 clinical S. aureus isolates which had been sequenced 
and phenotyped previously and whose WGS and resistance data were 
available ("derivation set"). To ensure a representative range of sequence 
types, isolates were identified from bacteremia and carriage collections 
held at the Oxford Radcliffe Hospitals NHS Trust and Brighton and Sus- 
sex University Hospitals NHS Trust, spanning a period of 13 years (Table 
3) (10). The collection included 159 MRSA isolates (32%). All isolates had 
been tested at each site by the routine clinical laboratories for resis- 
tance to a standard first-line panel of antimicrobial agents (penicillin, 
methicillin, erythromycin, vancomycin, ciprofloxacin, tetracycline, gen- 
tamicin, fiisidic acid, and rifampin at both sites; mupirocin and clinda- 
mycin for Brighton isolates only; trimethoprim for Oxford isolates only). 
In the Brighton clinical laboratory, susceptibility testing was performed 
using the Vitek automated system (bioMerieux, Basingstoke, United 
Kingdom), and in the Oxford clinical laboratory, isolates were pheno- 
typed by disc diffusion (11). The susceptibility testing results were re- 
trieved electronically from laboratory databases. MethicUlin resistance 
was tested using cefoxitin (Brighton) or oxacillin (Oxford). 



Comparison of phenotype and predicted genotypic resistance. The 

predicted susceptibilities based on mobile and chromosomal genetic ele- 
ments, using the whole-genome sequences of the isolates in the derivation 
set, were compared with the routine laboratory susceptibility testing re- 
sults. Where there was a mismatch between genotypic prediction and 
recorded phenotype, isolates were retrieved from storage at — 80°C and 
had repeat susceptibility tested by gradient diffusion using EUCAST 
breakpoints (http://www.eucast.org/clinical_breakpoints/) to resolve the 
phenotype ("discordant repeat"). A very major error (VME) was defined 
as a susceptible genotype with a resistant phenotype, and a major error 
(ME) was defined as a resistant genotype with a susceptible phenotype. 

Revision of bioinformatics algorithm. Using the results obtained 
above, we further examined genotype-phenotype mismatches to identify 
whether algorithm improvements could be made. We extended the gene 
panel by manually searching references from the original search and 
added two additional genes to the panel (msrA And far) and added to the 
list of variants for fusA. We noted a high VME rate for penicillin and 
fusidic acid which was reduced by adjusting the algorithm quality filters to 
accept short or low-coverage contigs for blaZ,fusB, and far (see Results for 
details). We estimated sensitivity and specificity with different thresholds 
for the relative coverage required for these genes to be considered present 
in the derivation set. For the revised algorithm (v2.0), the best compro- 
mise between overall sensitivity and specificity was obtained by defining 
resistance as >30% relative coverage for blaZ,fusB, and/ur and as >80% 
relative coverage for the remaining mobile genes. 

Blind validation of revised prediction method vl.O. Because repeat 
susceptibility testing and revision of the genotype prediction algorithm 
were done with a priori knowledge of the phenotype, to validate the 
method, we applied v2.0 to a further 491 isolates (the "validation set" 
[Table 3]), with no previous information regarding the expected pheno- 
type available before genotypic interrogation (i.e., blind to phenotype). 

A total of 202 isolates were sourced from carriage collections (12) 
which had previously been sequenced but not phenotyped. These isolates 
were retrieved from storage at — 80°C for resistance testing. A further 289 
isolates were obtained from archived bloodstream collections at the Ox- 
ford and Brighton sites. For these, single colonies were plated on Colum- 
bia blood agar and grown at 37°C for 18 to 24 h. All the bacteria on the 
plate were harvested and suspended in 1.5 ml physiological saline. A por- 
tion (0.1 ml) was removed and replated onto Columbia blood agar and 
incubated overnight at 37°C for resistance phenotyping, and the remain- 
ing suspension was used to prepare DNA for WGS. 

To control for differences in phenotypic testing methods, all 491 iso- 
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lates had antimicrobial susceptibility testing performed by M. Morgan 
using the Phoenix automated microbiology system (BD Biosciences, 
Sparks, MD, USA) for a standard panel of antimicrobial agents (penicillin, 
methicillin, ciprofloxacin, erythromycin, fusidic acid, gentamicin, mupi- 
rocin, rifampin, tetracycline, and vancomycin). Isolates were also tested 
by disc diffusion (13) byK. Cole and J. R. Price (Brighton isolates, panel as 
described above) and N. C. Gordon (Oxford isolates, panel as described 
above plus trimethoprim). Concordant results for the two methods were 
used as the final phenotype. Isolates that were resistant to erythromycin 
were farther tested for clindamycin resistance by disc diffusion and the 
clindamycin D test (14). Where there was discordance between disc dif- 
fusion and BD Phoenix, repeat testing was performed by the gradient 
diffusion method (Etest) using EUCAST breakpoints, and this value was 
used as the phenotype. 

Whole-genome sequences were interrogated by BLAST + using the 
revised parameters as described above, performed in a blind manner (the 
phenotype not known) by T. Golubchik. Finally, the phenotypic and ge- 
notypic profiles were compared by a separate investigator (T. E. A. Peto). 
The sensitivity, specificity, and error rates ( calculated as percentages of the 
total number of resistance tests) were calculated against the concordant 
phenotype. 

As in the derivation set, where there was a mismatch between the 
genotype and phenotype, isolates had repeat susceptibility tested by gra- 
dient diffusion ("discordant repeat"). Possible explanations for the re- 
maining discordant results were explored by testing for penicillinase pro- 
duction with nitrocefin discs (15) and manually inspecting sequences for 
the presumed resistance-encoding genes or mutations to give an amended 
algorithm for future testing (v2.1). 

Study accession number. The sequences reported in this paper have 
been deposited in the European Nucleotide Archive Sequence Read Ar- 
chive under study accession number ERP004655. 

RESULTS 

Derivation set. WGS and routine laboratory antimicrobial sus- 
ceptibility testing results were available for 506 isolates, with 46 
different sequence types found by in silico MLST. Using the initial 
vl.O algorithm, 439 isolates (87%) had complete concordance be- 
tween the genotype and phenotype for all antimicrobial agents 
tested. The remaining 67 isolates had a total of 123 discrepancies. 
For five of these isolates, the frozen bacterial stocks were found to 
be contaminated with other organisms (one isolate with Proteus 
sp. and four isolates with coagulase-negative staphylococci), and 
further confirmatory testing could not be performed. These iso- 
lates were excluded from further analysis, leaving 501 isolates in 
the derivation set. Repeating the susceptibility testing for the isolates 
with phenotype/genotype mismatches reduced the error rate, with a 
total of 71 discrepancies in the 49 isolates remaining (Fig. 1). 

We noted a high VME rate for penicillin, with 9 ( 1.8%) isolates 
apparently resistant to penicillin (MIC > 0.12 mg/liter) in the 
absence of the blaZ gene based on > 80% relative coverage in WGS 
data. However, all 9 isolates were positive for blaZ by PGR. On 
manual inspection of the sequences, we found that the blaZ gene 
was in fact present on small sequence contigs (< 300 bp) or contigs 
with low coverage (depth of cover < 5) which did not meet the 
algorithm quality standard and had therefore been excluded. In- 
cluding these short or low-coverage contigs in the algorithm 
(v2.0) improved predictions (with relative coverage as low as 30% 
associated with phenotypic resistance). This phenomenon was 
also noted for fusidic acid, where the inclusion of low-quality 
contigs for the fusB and far genes improved prediction of pheno- 
type. This was not the case for the remaining genes associated with 
mobile elements in the panel where lowering the relative coverage 
threshold substantially increased false-positive results. 



3- 
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FIG 1 Comparison of percentages of errors for the derivation and validation 
sets, illustrating the change in error rate with repeat phenotyping and with 
optimized algorithm versions. Error rates for resistant and susceptible isolates 
are shown for each step of algorithm development: in the derivation set, the 
error rate was decreased by repeat susceptibility testing (discordant rpt). The 
VME rate was reduced by adjusting the algorthm parameters (v2.0), although 
this resulted in a slight increase in the ME rate. In the validation set, error rates 
were relatively low to start with and were improved further by repeat testing of 
the discordants (v2.0 discordant rpt) and by incorporating the novel blaZ 
mutations (v2.1). 



Using this revised genotypic prediction method (v2.0), we 
found 31 persistent discrepancies in 28 isolates across all antimi- 
crobials in the derivation set (Fig. 1 and Table 4). There were 22 
very major errors (0.4% VME rate) and 9 major errors (0.2% ME 
rate). The overall sensitivity and specificity were 0.98 (95% confi- 
dence interval [95% GI], 0.97 to 0.99) and 1.00 (95% GI, 0.99 to 
1.00), respectively. 

Blind validation set and genotypic prediction method v2.0. 
For the 491 isolates in the validation set, 61 different sequence 
types were identified by in silico MLST. The most frequent se- 
quence types were ST22 (13%) and ST30 (12%). Fifty-seven iso- 
lates were MRSA (12%). There were 60 errors in 48 isolates out of 
5,193 antimicrobial susceptibilities tested (Fig. 1 and Table 5). 
There were 25 VMEs: 6 for ciprofloxacin, 4 for erythromycin, 2 for 
clindamycin, 4 for fusidic acid, 3 for penicillin, 2 for methicil- 
lin, 2 for gentamicin, and 2 for trimethoprim. Of the 35 major 
errors, 25 were for penicillin. The very major error rate was 
0.5%, and the major error rate was 0.7%. The overall sensitivity 
was 0.97 (95% GI, 0.95 to 0.98), and the overaU specificity was 
0.99 (95% GI, 0.98 to 1). Further details for each of the isolates 
with discrepant results are given in Tables 6 (penicillin) and 7 
(other antimicrobial agents). 

(i) Penicillin. A total of 382 isolates were penicillin resistant 
(78%). Of the 3 very major errors (Table 5), one isolate was sus- 
ceptible on repeat testing (MIG of 0.06 mg/liter), while the other 2 
were confirmed as resistant by Etest (MIGs of 0.5 mg/liter and 0.25 
mg/liter). 

Although by using the prototype genotypic prediction method, 
we found a high rate of very major errors for penicillin, using the 
lower relative coverage (>30%) threshold to define resistance for 
v2.0 resulted in a high rate of major errors (5%), illustrating the 
inherent trade-off between sensitivity and specificity for any algo- 
rithm. 

Of the 25 isolates with major errors {blaZ positive but sus- 
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TABLE 4 Derivation set results" 



No. of isolates resistant by 
phenotype 



No. of isolates susceptible 
by phenotype 



Antimicrobial 
agent 



Susceptible 
by genotype 



Resistant by 
genotype 



Susceptible 
by genotype 



Resistant by 
genotype 



Very major Major 
Total no. error rate error rate Sensitivity Specificity 
of isolates (%) (%) (95% CI) (95% CI) 



Penicillin 


4 


438 


59 


0 


501 


0.8 


0 


0.99 (0.98-1.00) 


1.00 (0.92-1.00) 


Methicillin 


1 
i 






1 
i 


DUi 


u.z 


u.z 


n QQ ( n Q^; i ^^^ 


i.UU \\j.yo—i.\J\J ) 


Ciprofloxacin 


7 


165 


326 


3 


501 


1.4 


0.6 


0.96 (0.91-0.98) 


0.99 (0.97-1.00) 


Erythromycin 


1 


133 


366 


1 


501 


0.2 


0.2 


0.99 (0.95-1.00) 


1.00 (0.98-1.00) 


Clindamycin 


0 


88 


89 


0 


177' 


0 


0 


1.00 (0.95-1.00) 


1.00 (0.95-1.00) 


Tetracycline 


0 


28 


473 


0 


501 


0 


0 


1.00 (0.85-1.00) 


1.00 (0.99-1.00) 


Vancomycin 


0 


0 


501 


0 


501 


0 


0 


N/A-* 


1.00 (0.99-1.00) 


Fusidic acid 


3" 


38 


458 


2 


501 


0.6 


0.4 


0.93 (0.79-0.98) 


1.00 (0.98-1.00) 


Trimethoprim 


5 


10 


308 


0 


323 


1.5 


0 


0.67 (0.39-0.87) 


1.00 (0.98-1.00) 


Gentamicin 


0 


7 


494 


0 


501 


0 


0 


1.00 (0.60-1.00) 


1.00 (0.99-1.00) 


Mupirocin 


0 


2 


174 


2 


178 


0 


1.1 


1.00 (0.20-1.00) 


0.99 (0.96-1.00) 


Rifampin 


1 


2 


498 


0 


501 


0.2 


0 


0.67 (0.13-0.98) 


1.00 (0.99-1.00) 


Overall 


22 


1,069 


4,087 


9 


5,187 


0.2 


0.4 


0.98 (0.97-0.99) 


1.00 (0.99-1.00) 



" Comparison of results for individual antimicrobial agents for 501 carriage/bacteremia isolates by phenotype (Vitek or disc diffusion} and predicted susceptibility using v2.0 
genotypic prediction method. The result (resistant or susceptible) by phenotype refers to Vitek or disc diffusion results, and the result by genotype refers to the predicted 
susceptibility using the v2.0 genotypic prediction method. 

^ Two isolates had two nonsynonymous mutations in fusA not previously described in the literature (T326I plus E468V and T326I plus V90I) which may be responsible for the 
observed phenotypes. 

One isolate failed to grow for clindamycin testing. 
^ N/A, not applicable. 



ceptible by initial phenotyping), 2 were penicillin resistant 
when retested by Etest (MICs of 0.25 mg/liter) and were also 
positive for penicillinase production. Eleven isolates were pos- 
itive for penicillinase production at 10 min, and a further 3 
isolates were weakly positive (i.e., positive after 2 h). Nine re- 
maining isolates with blaZ had no penicillinase activity on 
nitrocefin disc testing and had susceptible MICs by gradient 
diffusion (<0.12 mg/liter). Inspection of the b/aZ genes found 
single-base-pair insertions in 2 cases (positions 256 and 436, 



respectively, relative to the reference sequence), causing a 
frameshift in the translated protein which may explain the lack 
of enzymatic activity. Similarly, 4 further isolates had identical 
deletions at position 99, again resulting in a frameshift and 
premature termination and correlating with a complete ab- 
sence of pencillinase activity. We did not find any of these 
mutations in the b/aZ-positive, fully phenotypically resistant 
isolates. Revising the algorithm for blaZ alone to define pres- 
ence (>30% coverage) of blaZ with these insertions/deletions 



TABLE 5 Validation set results" 



Antimicrobial 
agent 



No. of isolates resistant by 
phenotype 

Susceptible by Resistant by 
genotype*" genotype 



No. of isolates susceptible 
by phenotype 

Susceptible by Resistant by 
genotype genotype*" 



Very major error 
Total no. rate (%) (95% 
of isolates CI) 



IVIajor error rate 
(%) (95% CI) 



Sensitivity 
(95% CI) 



Specificity 
(95% CI) 



PenicUlin 


3(2) 


379 


84 


25 (9) 


491 


0.6 (0.1-1.8) 


5.1 (3.3-7.4) 


0.99 (0.98- 


-1.00) 


0.77 (0.68-0.84) 


Methicillin 


2(1) 


55 


432 


2(1) 


491 


0.4 (0.05-1.5) 


0.4 (0.05-1.5) 


0.96 (0.87- 


-0.99) 


1.00 (0.98-1.00) 


Ciprofloxacin 


6(4) 


64 


420 


1(0) 


491 


1.2 (0.4-2.6) 


0.2 (0.05-1.1) 


0.91 (0.82- 


-0.96) 


1.00 (0.98-1.00) 


Erythromycin 


4(2) 


79 


405 


3(3) 


491 


0.8 (0.2-2) 


0.6 (0.1-1.8) 


0.95 (0.87- 


-0.98) 


0.99 (0.98-1.00) 


Clindamycin 


2(2) 


77 


2 


0 


81 


2.5 (0.3-8.6) 


0.0 (0-4.4) 


0.97 (0.90- 


-1.00) 


1 (0.20-1.00) 


Tetracycline 


0 


18 


471 


2(2) 


491 


0.0 (0-0.7) 


0.4 (0.05-1.5) 


1.00 (0.78- 


-1.00) 


1.00 (0.98-1.00) 


Vancomycin 


0 


0 


491 


0 


491 


0.0 (0-0.7) 


0.0 (0-0.7) 


N/A' 




1.00 (0.99-1.00) 


Fusidic acid 


4(4) 


39 


448 


0 


491 


0.8 (0.2-2) 


0.0 (0-0.7) 


0.91 (0.77- 


-0.97) 


1.00 (0.99-1.00) 


Trimethoprim 


2(2) 


1 


197 


2(1) 


202 


1.0 (0.1-3.5) 


1.0 (0.1-3.5) 


0.33 (0.02- 


-0.87) 


0.99 (0.96-1.00) 


Gentamicin 


2(2) 


2 


487 


0 


491 


0.4(0.05-1.5) 


0.0 (0-0.7) 


0.50 (0.09- 


-0.91) 


1.00 (0.99-1.00) 


Mupirocin 


0 


2 


489 


0 


491 


0.0(0-0.7) 


0.0 (0-0.7) 


1.00 (0.20- 


-1.00) 


1.00 (0.99-1.00) 


Rifampin 


0 


5 


486 


0 


491 


0.0 (0-0.7) 


0.0 (0-0.7) 


1.00 (0.46- 


-1.00) 


1.00 (0.99-1.00) 


Overall 


25 (19) 


644 


4,410 


35(16) 


5,112 


0.5 (0.3-0.7) 


0.7 (0.5-0.9) 


0.97 (0.95- 


-0.98) 


0.99 (0.99-1.00) 



" Comparison of susceptibility results for 491 bacteremia and carriage isolates by phenotype (Phoenlx/disc diffusion consensus result) and genotype prediction tool v2.0. The result 
(resistant or susceptible) by phenotype refers to Phoenix or disc diffusion consensus results, and the result by genotype refers to the predicted susceptibility using the v2.0 genotypic 
prediction method. 

Figures in parentheses are numbers of isolates with discrepant phenotype confirmed on repeat testing. 
'^N/A, not applicable. 
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TABLE 6 Results for MICs, nitrocefin disc testing, and blaZ variants for 
isolates with discrepant results for penicillin in the validation set by the 
v2.0 genotype prediction method 

Nitrocefin 
disc test 
result 

Initial 





In siUco 


consensus 


MIC 


10 


120 




Isolate'* 


MLST 


phenotype^ 


(mg/liter) 


min 


min 


blaZ variant'^ 


C00001124 


ST45 


S 


0.12 


_ 


_ 


InsA436 


C00001241 


ST30 


S 


0.12 


— 


— 


InsA256 


C00001144 


ST30 


S 


0.12 






A99- 


C00001203 


ST30/36 


S 


0.06 






A99- 


C00013228 


ST30 


S 


0.06 






A99- 


C00013375 


ST7 


S 


0.12 


— 


— 


A99- 


C00001080* 


ST22 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001092* 


ST2417 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001112 


ST2438 


S 


0.25 


+ 


+ 


Similar to wild type 


C00001148 


STl 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001158 


ST582 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001142 


ST582 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001192 


ST2417 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001199 


ST30 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001205 


ST30 


S 


0.06 


+ 


+ 


Similar to wild type 


C00001217 


ST30 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001231 


ST22 


S 


0.12 


+ 


+ 


Similar to wild type 


C00001266 


ST188 


S 


0.12 


+ 


+ 


Similar to wild type 


C00012754 


ST20 


S 


0.25 


+ 


+ 


Similar to wild type 


C00001277 


ST2445 


S 


0.12 




+ 


Similar to wild type 


C00013249 


ST45 


S 


0.06 




+ 


Similar to wild type 


C00001093 


ST15 


S 


0.12 




+ 


Similar to wild type 


C00001104 


ST8 


S 


0.023 






Similar to wild type 


COOOOllll 


ST5 


S 


0.12 






Similar to wild type 


C00001182 


ST5 


S 


0.12 






Similar to wild type 


C00001147* 


ST22 


R 


0.5 






Absent 


C00012780 


ST97 


R 


0.25 






Absent 


C00013232 


ST7 


R 


0.06 






Absent 



" Isolates with discrepancies for one or more other antimicrobial agents are indicated by 
an asterisk. 

^ S, sensitive; R, resistant. 
InsA436, insertion of A at position 436; A99— , deletion of A at position 99. 



as susceptible (v2.1), only 3 isolates had wild-type fo/aZ with no 
penicillinase production detected by any of the phenotypic test 
methods, giving a major error rate of 0.6% which is in keeping 
with the observed error rates for the other antimicrobial agents 
in this study. 

(ii) Methicillin. Fifty-seven isolates (12%) were MRSA by ini- 
tial phenotype. We found 2 very major errors (0.4%) and 2 major 
errors (0.4%) for methicillin. Of the very major errors, one isolate 
was susceptible on repeat testing (MIC of 0.5 mg/liter), and sim- 
ilarly, one of the isolates with a major error was resistant on repeat 
testing (MIC of 16 mg/liter), reflecting either sample heterogene- 
ity or more probably laboratory error. Both methicillin resistance 
in the absence of mecA resulting from overexpression of penicilli- 
nase (16) and methicillin susceptibility despite the presence of 
mecA have been previously described (5, 17), potential explana- 
tions for the 2 remaining errors. None of the isolates contained 
mecC (18). 

(iii) Ciprofloxacin. Seventy isolates were phenotypically resis- 
tant (14%). In both derivation and validation sets, the highest 
VME rate was seen for ciprofloxacin. For the validation set, this 
was 6/49 1(1.2%), with a major error rate of 0.2% ( 1/49 1 ) . Two of 
the 6 isolates with very major errors were susceptible on repeat 
testing. For the four other isolates with discrepant results, we ex- 
amined the grlB, gyrB, and norA genes but found no significant 
mutations (19, 20). The single isolate with a major error was found 
to be resistant on repeat testing (MIC of 2 mg/liter). Of the 64 
isolates predicted to be resistant by genotype and resistant by phe- 



notype, 60 had the amino acid substitutions S80F in grlA and S84L 
in gyrA. Four isolates had S84L in gr\A only. 

(iv) Erythromycin and clindamycin. A total of 83 (17%) iso- 
lates were resistant to erythromycin by phenotype. For these 83 
isolates, we observed 4 very major errors (0.8%). Two of these 
were susceptible on repeat testing (MICs of 0.75 mg/liter), but the 
other 2 were confirmed as resistant. There were 3 major errors 
(0.6%) which were all confirmed as susceptible (MICs, 0.25 to 5 
mg/liter). Two had ermC detected by BLAST, and one had ermA. 
For the concordant resistant isolates, 37 had ermA alone, 37 had 
ermC alone, 1 had erniT, 2 had msrA alone, 1 had ermC and msrA, 
and 1 had ermC and ermA. None of the isolates had ermB. Out of 
81 confirmed erythromycin-resistant isolates, 45 were resistant to 
clindamycin by disc diffusion and 36 were susceptible. Of these 
isolates, 34 had inducible resistance by D-test and contained either 
ermA, ermC, or ermT as described above. The two isolates that 
were resistant to erythromycin but susceptible to clindamycin 
contained msrA only. There were 2 very major errors (same iso- 
lates as for erythromycin VMEs). We did not observe any corre- 
lation between erm variant and inducibility of clindamycin resis- 
tance. We did not find vga{A)j^Q in any of the isolates (21). 

(v) Tetracycline. Eighteen isolates were phenotypically resis- 
tant (4%). Two major errors were seen for tetracycline (0.4%), 
and both isolates were confirmed susceptible by Etest. One isolate 
had tetK detected by BLAST (MIC of 0. 1 9 mg/liter) , and the other 
had tetM (MIC of 0.75 mg/liter). There were no very major errors. 
Of the 18 concordant resistant isolates, 16 had tetK, 1 had both 
tetK and tetL, and 1 had tetM. 

(vi) Vancomycin. Vancomycin resistance was not identified in 
our isolates either phenotypically or genotypically. Consequently, 
although the specificity of the method was 1.00, its sensitivity is 
not estimable due to the rarity of vcwA-mediated resistance. To 
investigate the possibility of intermediate resistance missed by 
phenotyping, post hoc we also screened the collection for recently 
published mutations in the yycG gene (22) found to be associated 
with intermediate MICs in laboratory mutants but did not iden- 
tify these mutations in any isolate. 

(vii) Fusidic acid. Forty-three isolates were phenotypically re- 
sistant (9%). Four very major errors were observed for fusidic 
acid. In 2 cases, manual examination of the sequences revealed 
single point mutations in the chromosomal fusA gene, predicted 
to result in the amino acid substitution of isoleucine for valine at 
position 90. This substitution has been reported in both suscepti- 
ble and resistant isolates (23), and therefore, its role in resistance is 
unresolved. One resistant isolate had a substitution of serine for 
threonine at position 656 (predicted sensitive on genotyping 
based on a review of the literature) . Since the substitution of lysine 
for threonine at the same position is associated with full pheno- 
typic resistance (23), this may explain the discrepancy seen in this 
case. 

(viii) Trimethoprim. Phenotypic testing for trimethoprim re- 
sistance was performed for only 202 isolates, as it is not routinely 
tested in Brighton, and it was not part of the BD Phoenix panel 
used. Results are therefore taken from Oxford isolates, tested by 
disc diffusion testing only. Three isolates were phenotypically re- 
sistant (2%). There were 2 very major errors (1%) and 2 major 
errors. Both isolates with very major errors were susceptible on 
repeat testing (performed by disc diffusion only), and 1 isolate 
with a major error was resistant on repeat testing. 

(ix) Gentamicin. Four isolates were phenotypically resistant 
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TABLE 7 Details of antimicrobial discrepancies for validation set and genotype prediction tool v2.0 



antimicrobial ageiit(s) 


Isolate^ 


Initial 

plicnotypc^ 


Mir (■mp/litprl 
on rcpc3,t testing^ 


Genotyping details 


Very major errors 










MethiciUin 


C00001115*# 


R 


0.5 (S) 


No mecA or mecC detected 




C00001162* 


R 


16 


No mecA or mecC detected 


Ciprofloxacin 


C00013185 


R 


1.5 


No significant gyrA or grlA mutation 




C00001162* 


R 


2 


No significant gyrA or grlA mutation 




C00001092* 


R 


2 


No significant gyrA or grlA mutation 




C00001115* 


R 


2 


No significant gyrA or grlA mutation 




C00001105# 


R 


<0.125 (S) 


No significant gyrA or grlA mutation 




C00001109# 


R 


<0.125 (S) 


No significant gyrA or grlA mutation 


Erythromycin 


C00001147# 


R 


0.75 (S) 


No erm or msrA gene detected 




COOO 13384* 


R 


0.75 (S) 


No erm or msrA gene detected 


Erythromycin/clindamycin 


C00001224 


R/R 


2/ND 


No erm or msrA gene detected 




C00001162* 


R/R 


3/ND 


No erm or msrA gene detected 


Fusidic acid 


C00013212 


R 


R (disc only) 


V90I mutation mfusA 




C00013194 


R 


R (disc only) 


V90I mutation mfusA 




C00001259 


R 


R (disc only) 


Wild-type /usA, nofusB or far detected 




C00001276 


R 


R (disc only) 


T656S substitution in fusA 


Trimethoprim 


C00001092** 


R 


S (disc only) 


Wild-type dfrB, no dfrA or dfrG detected 




C00001235# 


R 


S (disc only) 


Wild-type dfrB, no dfrA or dfrG detected 


Gentamicin 


C00001115* 


R 


24 


No aacA-aphD, aadD or aadE, or aphA3 detected 




C00013331 


R 


3 


No aacA-aphD, aadD or aadE, or aphA3 detected 


Major errors 










MethiciUin 


C00001222# 


S 


16 (R) 


mecA gene detected, 100% relative coverage 




C00001080* 


s 


0.3 


mecA gene detected, 100% relative coverage 


dprotloxacm 


^UUUUiUoU # 


c 
o 


/ IKJ 


S80F mutation in grlA, S841 mutation in gyrA 


Erythromycin 


C00001249 


s 


0.25 


ermA gene detected, 100% relative coverage 




C00001189 


s 


0.25 


ermC gene detected, 100% relative coverage 




C00001080* 


s 


0.5 


ermC gene detected, 100% relative coverage 


Tetracycline 


C00012796 


s 


0.19 


tetK gene detected, 100% relative coverage 




C00001247 


s 


0.75 


tetM gene detected, 95% relative coverage 


Trimethoprim 


C00001240 


s 


S (disc only) 


H31N mutation in dfrB, usually confers resistance 




C00001123# 


s 


R (disc only) 


dfrG gene detected, 100% relative coverage 



" Isolates with discrepancies for two or more antimicrobial agents are indicated by an asterisk after the isolate name. Isolates for which the phenotype matched the genotype on 
repeat testing are indicated by a hash symbol {#) after the isolate name. 
R, resistant; S, sensitive. Erythromycin and clindamycin were tested; the first phenotype or MIC is for erythromycin, and the second is for clindaymycin. ND, not done. 



(1%). There were 2 very major errors, and resistance was con- 
firmed by repeat testing (MICs of 24 mg/liter and 3 mg/liter). We 
did not find aacA-aphD in either isolate, and they were also neg- 
ative for aadD, aadE, and aphA3. The low frequency of resistance 
in this collection means the evaluation is underpowered, which 
makes accurately estimating the overall sensitivity impossible. 

(x) Mupirocin and rifampin. Two isolates (0.4%) and 5 (1%) 
isolates were resistant to mupirocin or rifampin, respectively. 
There were no errors observed; however, the relatively low resis- 
tance rates limit the power of the evaluation. Mupirocin resistance 
in both cases was attributable to mupA. 

DISCUSSION 

We have developed and tested a method for genotypic prediction 
of antimicrobial susceptibility phenotype from whole-genome se- 
quencing within a single species, using substantial derivation (n = 
501) and validation (« = 491) sets. Our goal was to develop a 
method with direct relevance to clinical practice, and therefore, 
our prediction tool is based on genetic mechanisms that have been 
reported in clinical isolates, rather than including the very large 
number of potential resistance determinants in existing databases 



such as ResFinder (24) and CARD (Comprehensive Antibiotic 
Resistance Database) (25), many of which have not been pheno- 
typically verified in clinical isolates. Excluding phenotypic errors 
and isolates with multiple discrepancies, the final results show 
highly promising concordance, with a very major discrepancy rate 
of 8/652 (1.2%) and a major discrepancy rate of 13/4,423 (0.3%), 
comparable with the error rates for current phenotypic method- 
ologies (26, 27) and within the acceptable limits set by the U.S. 
Food and Drug Administration (28) for marketing approval of 
new susceptibility testing methods (<1.5% very major discrep- 
ancy rate, <3% major discrepancy rate). 

In both the derivation and validation sets, a substantial num- 
ber (40%) of the initial observed errors were resolved by repeat 
phenotypic susceptibility testing. This is shown in Fig. 1, which 
illustrates the phenotype/genotype discrepancies on initial testing, 
after repeat phenotyping, and after adjustment of the algorithm 
for both the derivation and validation sets. Even when testing 
according to published guidelines, occasional errors in reagent 
and medium storage, incubation conditions, and inoculum den- 
sity may contribute to variation in observed phenotype, and sim- 
ilarly, inaccuracies in labeling, interpretation, and data entry are 
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liable to occur in any system which is not fully automated. Some of 
these factors may also contribute to the genotyping results, and 
this is supported by the fact that several isolates were discordant 
for multiple antimicrobial agents, suggesting a labeling or storage 
error. 

The most problematic antimicrobial agent was penicillin, with 
an unacceptably high very major error rate (1.8%) using vl.O in 
the derivation set and an unacceptably high major error rate (5%) 
in the validation set using v2.0. This may be due to the variable 
location of hlaZ, which may be on a plasmid or integrated into the 
chromosome (29). Isolates with chromosomaUy integrated hlaZ 
are likely to have average coverage in the sequencing reads, while 
isolates with plasmid-carried copies may have very high (if multi- 
ple copies are carried) or very low (because of poor mapping to the 
reference) coverage in that region. As a result, these regions may 
be rejected as poor quality by the assembly software, because they 
fall outside the coverage levels of the rest of the genome. This 
problem may be overcome in future with longer reads or alterna- 
tive methods for de novo assembly; however, our results highlight 
that relative coverage cutoff values may need to be set individually 
based on gene location. Conversely, in the validation set, we found 
that most of the major errors were due to a lack of concordance 
between penicillinase production and disc or broth dilution test- 
ing (30). We also identified three novel J;ZflZ frameshift mutations 
that were associated with susceptibility despite the presence of the 
gene. Similar frameshift deletions have been reported in blaZ- 
positive, penicillin-susceptible isolates from cows with mastitis 
(31). 

Relatively high very major error rates (1.2%) were also seen for 
ciprofloxacin. Staphylococcal resistance to quinolones is predom- 
inantly due to point mutations in the grlA and gyrA genes (19). 
Low-level resistance may result from mutations in grlB or gyrB or 
alterations in expression of the efflux pump NorA (20). However, 
in most cases, these are in combination with grlA or gyrA muta- 
tions, and consequently, the overall phenotypic effect is difficult to 
predict. All isolates in this study contained the norA gene, but no 
correlation was seen between quinolone resistance and the norA 
gene or its known regulators, or the grlB or gyrB gene. Further 
studies maybe able to elucidate the contribution of these to overall 
quinolone resistance in S. aureus. 

There remains a very small subset of isolates where the genetic 
basis for an antimicrobial phenotype of the organism was not clear 
(15/48, excluding presumed phenotyping errors or identified 
novel variants). If not due to human error, these may be due to 
sequencing or assembly error, for example a miscalled base at a 
critical position. However, for S. aureus, our group has sequenced 
multiple replicates of strain MRSA 252 for comparison with a 
capillary sequenced high-quality reference sequence of this ge- 
nome (GenBank accession no. BX571856.1). The in-house esti- 
mated false-positive rate (detection of a spurious variant) for our 
bioinformatics pipeline was previously estimated to be 2.5 X 10^ 
per nucleotide, i.e., 0.0075 per genome (32). Consequently, al- 
though an incorrect susceptibility prediction may in theory occur 
because of a sequencing error, in practice we anticipate that the 
impact of this will be extremely small. 

A more likely explanation for the discrepancies may be as 
yet unidentified alterations in regulatory regions or alternative 
resistance mechanisms. This highlights a major challenge for in 
silico resistotyping, since a query-based method cannot recog- 
nize novel variants that are not in the relevant database. Fur- 



thermore, gene expression is the result of complex interactions 
between transcription promoters, repressors, and other regu- 
latory molecules, which may be remotely located from the gene 
itself. Phenotypic assays address these challenges by measuring 
the overall combined impact of all these mechanisms, although 
it is important to recognize that factors resulting in delayed 
transcription may cause isolates to be falsely identified as sus- 
ceptible, as we found for penicillin. 

Therefore, it is unlikely that WGS will be able to replace phe- 
notypic methods entirely, and some form of phenotypic surveil- 
lance will need to be maintained, for example based on clusters of 
treatment failures. However, the need for routine phenotyping 
should diminish as examination of WGS and phenotyping data 
for isolates with apparently novel mutations or with nonfunction- 
ing resistance genes elucidate the contribution of the underlying 
genetics. This new knowledge can then be added to resistance 
determinant databases and absorbed into WGS investigations, as 
demonstrated by the improved specificity resulting from the in- 
corporation of the novel blaZ variants above into the algorithm. 
The huge potential of WGS data lies in its completeness: once 
sequenced, a genome can be accessed repeatedly to query for novel 
genes of interest as these arise (e.g., by our scan for recently pub- 
lished yycG gene mutations [22]). 

The cost and turnaround time for WGS have fallen rapidly in 
recent years, with the current full economic cost of sequencing a 
single isolate estimated to be less than £40 ($65) (33) compared 
with approximately £5 ($8) per sample using the BD Phoenix. 
Current turnaround times are directly comparable, with next- 
generation sequencers able to deliver results in 27 h (5) and the 
likelihood that this will be reduced to a matter of hours in the near 
future. The potential for full automation of WGS may also reduce 
human error, as described above. Further, the same WGS can also 
provide information about potential transmission (34), and the 
same methods as used to identify resistance determinants could be 
used to bioinformatically extract the presence/absence of viru- 
lence genes (35). 

The advances provided by WGS, combined with robust clinical 
outcome data, should greatly enhance our understanding of the 
genetic basis of antimicrobial resistance, with the potential for 
identifying new antimicrobial drug targets. The consequent 
promise of improved drug discovery in the face of current global 
concern regarding emerging antimicrobial resistance makes the 
prospect of routine use of WGS increasingly attractive. 
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